This comic strip…
… has lead to a test called Bechdel test. It aims to address Hollywood’s gender bias and applies to movies. It consists in two simple questions:
Does it have at least two named female characters ?
And do those characters have at least one conversation that is not about a man?
“Easy ! All movies passes the test !” Well…Unfortunately, not at all. Although “The rule” was written in the mid-80’s, a third of the top 50 movies of 2016 failed the Bechdel Test.
On december 21, 2017 the pure player fivethirtyeight.com publishes an article on the new Bechdel test : https://projects.fivethirtyeight.com/next-bechdel/
The authors “reached out to more than a dozen women in film and television — writers, directors, actresses and producers — to ask what they think the next Bechdel Test should be”. The results are quite different considering how they tackle the problem.
That’s what inside the “nextBechdel_allTests.csv” file from the fivethirtyeight github repository : https://github.com/fivethirtyeight/data/tree/master/next-bechdel
You can fetch the data using:
library(readr)
new_bechdel <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/next-bechdel/nextBechdel_allTests.csv")
We would like you to build a new Bechdel test too. Build a test that examine the movie and/or behind the camera and tries to “measure” the gender bias. Experiment it on the top 50 movies of 2016. To do so, no holds barred. You can either use the provided data or either add exogeneous data (please provide the sources).
Here are the lines to import the provided data in fivethirtyeight’s project:
cast_gender <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/next-bechdel/nextBechdel_castGender.csv")
crew_gender <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/next-bechdel/nextBechdel_crewGender.csv")
Finely describe the process that lead you to the score. What choices did you made ? Why ? How did you get into the data to confirm or invalidate your hypothesis ?
Please, submit your work in shape of a zipped folder containing an RStudio project to xhec2018@thinkr.fr Deadline : 4th november, 11:00 PM The name of the project should be “bechdel_firstname_lastname”. The project must contain a .Rmd file and the corresponding rendered .html file. Packages should be loaded at the top of the .Rmd file. Exogenous data must be included if you use some (provide the file or provide the instructions to build them) The whole thing should be reproducible.
| criteria | points |
|---|---|
| sent the project to the correct mail address | 1 |
| project name has the correct naming convention | 1 |
| the project contains an .Rmd and .html file | 1 |
| the .Rmd file “knits” | 2 |
| the .Rmd file contains relative paths only | 1 |
| the .rmd file is heavily commented | 4 |
| it shows at least 2 explorations to build a new test | 3 |
| it shows at least 2 graphs as a result of these explorations | 2 |
| each graph as a title, labeled axis and a comment on it’s content | 2 |
| a short paragraph describes the strength and weakness of your test | 3 |
| bonus points for using exogenous data | 2 |
update 2018-10-15: in the scoregrid, “exploration” means “show some of the wandering you made to build a new test”. Let say you want to explore “A film fails if lesser than 40% of the cast and lesser than 40% of the crew are women”. To do so, you’re going to calculate how many films pass the test, and why. Therefore, maybe you’re going to look which films are concerned, maybe you’re going to test 45% instead of 40%. Maybe proportion is not relevant and has side effect so you’re going to adapt your test in something like “A film fails if lesser than 40% (or at least 5 characters if the cast counts lesser than 30 people) of the cast and lesser than 40% of the crew are women”. We saw that data visualisation works hand in hand with data description, that’s why you should show “at least 2 graphs as a result of these explorations”